Overview

Dataset statistics

Number of variables25
Number of observations217
Missing cells4
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory159.0 KiB
Average record size in memory750.1 B

Variable types

Text1
Numeric9
Categorical13
DateTime1
Boolean1

Dataset

DescriptionJHB_WRHI_003 - Quality-corrected harmonized data
CreatorRP2 Clinical Data Quality Team
AuthorQuality-Checked Data
URLHEAT Research Projects

Variable descriptions

Age (at enrolment)Patient age at study enrollment
CD4 cell count (cells/µL)CD4+ T lymphocyte count (missing codes removed)
HIV viral load (copies/mL)HIV RNA copies per mL (missing codes removed)
BMI (kg/m²)Body Mass Index (extreme values removed)
Waist circumference (cm)Waist circumference (corrected from mm to cm)
ALT (U/L)Alanine aminotransferase (missing codes removed)
Platelet count (×10³/µL)Platelet count (missing codes removed)
Hematocrit (%)Hematocrit (zero values removed)
Lymphocyte count (×10⁹/L)Lymphocyte absolute count (corrected labeling)
Neutrophil count (×10⁹/L)Neutrophil absolute count (corrected labeling)
cd4_correction_appliedQuality flag: CD4 missing codes removed
final_comprehensive_fix_appliedQuality flag: Comprehensive corrections applied
waist_circ_unit_correction_appliedQuality flag: Waist circ unit corrected

Alerts

study_source has constant value "JHB_WRHI_003"Constant
province has constant value "Gauteng"Constant
city has constant value "Johannesburg"Constant
HIV_status has constant value "Positive"Constant
Antiretroviral Therapy Status has constant value "Positive"Constant
cd4_correction_applied has constant value "0.0"Constant
final_comprehensive_fix_applied has constant value "1.0"Constant
waist_circ_unit_correction_applied has constant value "False"Constant
ALT (U/L) is highly overall correlated with AST (U/L)High correlation
AST (U/L) is highly overall correlated with ALT (U/L)High correlation
CD4 cell count (cells/µL) is highly overall correlated with White blood cell count (×10³/µL)High correlation
HIV viral load (copies/mL) is highly overall correlated with Patient IDHigh correlation
Patient ID is highly overall correlated with HIV viral load (copies/mL) and 5 other fieldsHigh correlation
Race is highly overall correlated with Patient IDHigh correlation
Sex is highly overall correlated with Patient ID and 1 other fieldsHigh correlation
White blood cell count (×10³/µL) is highly overall correlated with CD4 cell count (cells/µL)High correlation
hemoglobin_g_dL is highly overall correlated with SexHigh correlation
jhb_subregion is highly overall correlated with Patient ID and 2 other fieldsHigh correlation
latitude is highly overall correlated with Patient ID and 2 other fieldsHigh correlation
longitude is highly overall correlated with Patient ID and 2 other fieldsHigh correlation
Race is highly imbalanced (95.8%)Imbalance
HIV viral load (copies/mL) is highly imbalanced (61.9%)Imbalance
CD4 cell count (cells/µL) has 4 (1.8%) missing valuesMissing
anonymous_patient_id has unique valuesUnique
Patient ID has unique valuesUnique

Reproduction

Analysis started2025-11-24 21:50:06.015012
Analysis finished2025-11-24 21:50:12.695530
Duration6.68 seconds
Software versionydata-profiling vv4.18.0
Download configurationconfig.json

Variables

Distinct217
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size15.7 KiB
2025-11-24T23:50:12.735696image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

Max length17
Median length17
Mean length17
Min length17

Characters and Unicode

Total characters3689
Distinct characters19
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique217 ?
Unique (%)100.0%

Sample

1st rowHEAT_329E55DDD278
2nd rowHEAT_C8A77DD97D98
3rd rowHEAT_A4407F8E079E
4th rowHEAT_7DCC7C7F1641
5th rowHEAT_32253618AEF8
ValueCountFrequency (%)
heat_329e55ddd2781
 
0.5%
heat_b378f883c50b1
 
0.5%
heat_133c575ec4791
 
0.5%
heat_a4407f8e079e1
 
0.5%
heat_7dcc7c7f16411
 
0.5%
heat_32253618aef81
 
0.5%
heat_5c22fd95bf091
 
0.5%
heat_a5dc9507fdda1
 
0.5%
heat_605625b419c81
 
0.5%
heat_931a30042daf1
 
0.5%
Other values (207)207
95.4%
2025-11-24T23:50:12.843264image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
A377
 
10.2%
E357
 
9.7%
H217
 
5.9%
T217
 
5.9%
_217
 
5.9%
8191
 
5.2%
6179
 
4.9%
4177
 
4.8%
F174
 
4.7%
5174
 
4.7%
Other values (9)1409
38.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1817
49.3%
Decimal Number1655
44.9%
Connector Punctuation217
 
5.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
8191
11.5%
6179
10.8%
4177
10.7%
5174
10.5%
2165
10.0%
0163
9.8%
7153
9.2%
1152
9.2%
3152
9.2%
9149
9.0%
Uppercase Letter
ValueCountFrequency (%)
A377
20.7%
E357
19.6%
H217
11.9%
T217
11.9%
F174
9.6%
B165
9.1%
D157
8.6%
C153
8.4%
Connector Punctuation
ValueCountFrequency (%)
_217
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1872
50.7%
Latin1817
49.3%

Most frequent character per script

Common
ValueCountFrequency (%)
_217
11.6%
8191
10.2%
6179
9.6%
4177
9.5%
5174
9.3%
2165
8.8%
0163
8.7%
7153
8.2%
1152
8.1%
3152
8.1%
Latin
ValueCountFrequency (%)
A377
20.7%
E357
19.6%
H217
11.9%
T217
11.9%
F174
9.6%
B165
9.1%
D157
8.6%
C153
8.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII3689
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A377
 
10.2%
E357
 
9.7%
H217
 
5.9%
T217
 
5.9%
_217
 
5.9%
8191
 
5.2%
6179
 
4.9%
4177
 
4.8%
F174
 
4.7%
5174
 
4.7%
Other values (9)1409
38.2%

Patient ID
Real number (ℝ)

High correlation  Unique 

Distinct217
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean173.52995
Minimum1
Maximum351
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.4 KiB
2025-11-24T23:50:12.898084image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile14.6
Q187
median175
Q3260
95-th percentile333.4
Maximum351
Range350
Interquartile range (IQR)173

Descriptive statistics

Standard deviation101.83674
Coefficient of variation (CV)0.58685398
Kurtosis-1.1196307
Mean173.52995
Median Absolute Deviation (MAD)86
Skewness0.0021213009
Sum37656
Variance10370.722
MonotonicityStrictly increasing
2025-11-24T23:50:12.947058image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11
 
0.5%
2341
 
0.5%
2151
 
0.5%
2171
 
0.5%
2201
 
0.5%
2221
 
0.5%
2231
 
0.5%
2241
 
0.5%
2271
 
0.5%
2281
 
0.5%
Other values (207)207
95.4%
ValueCountFrequency (%)
11
0.5%
31
0.5%
41
0.5%
51
0.5%
61
0.5%
71
0.5%
81
0.5%
91
0.5%
101
0.5%
121
0.5%
ValueCountFrequency (%)
3511
0.5%
3501
0.5%
3491
0.5%
3481
0.5%
3471
0.5%
3461
0.5%
3451
0.5%
3421
0.5%
3381
0.5%
3371
0.5%

study_source
Categorical

Constant 

Distinct1
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size14.6 KiB
JHB_WRHI_003
217 

Length

Max length12
Median length12
Mean length12
Min length12

Characters and Unicode

Total characters2604
Distinct characters9
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJHB_WRHI_003
2nd rowJHB_WRHI_003
3rd rowJHB_WRHI_003
4th rowJHB_WRHI_003
5th rowJHB_WRHI_003

Common Values

ValueCountFrequency (%)
JHB_WRHI_003217
100.0%

Length

2025-11-24T23:50:12.992837image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:50:13.025116image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
jhb_wrhi_003217
100.0%

Most occurring characters

ValueCountFrequency (%)
H434
16.7%
_434
16.7%
0434
16.7%
J217
8.3%
B217
8.3%
W217
8.3%
R217
8.3%
I217
8.3%
3217
8.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1519
58.3%
Decimal Number651
25.0%
Connector Punctuation434
 
16.7%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
H434
28.6%
J217
14.3%
B217
14.3%
W217
14.3%
R217
14.3%
I217
14.3%
Decimal Number
ValueCountFrequency (%)
0434
66.7%
3217
33.3%
Connector Punctuation
ValueCountFrequency (%)
_434
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1519
58.3%
Common1085
41.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
H434
28.6%
J217
14.3%
B217
14.3%
W217
14.3%
R217
14.3%
I217
14.3%
Common
ValueCountFrequency (%)
_434
40.0%
0434
40.0%
3217
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2604
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
H434
16.7%
_434
16.7%
0434
16.7%
J217
8.3%
B217
8.3%
W217
8.3%
R217
8.3%
I217
8.3%
3217
8.3%
Distinct112
Distinct (%)51.6%
Missing0
Missing (%)0.0%
Memory size3.4 KiB
Minimum2016-07-19 00:00:00
Maximum2017-06-15 00:00:00
Invalid dates0
Invalid dates (%)0.0%
2025-11-24T23:50:13.062357image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:13.111160image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Age (at enrolment)
Real number (ℝ)

Patient age at study enrollment

Distinct39
Distinct (%)18.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean41.663594
Minimum20
Maximum67
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.4 KiB
2025-11-24T23:50:13.157736image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum20
5-th percentile30
Q136
median40
Q347
95-th percentile56.2
Maximum67
Range47
Interquartile range (IQR)11

Descriptive statistics

Standard deviation8.0984802
Coefficient of variation (CV)0.19437786
Kurtosis-0.0090547417
Mean41.663594
Median Absolute Deviation (MAD)6
Skewness0.40369989
Sum9041
Variance65.585381
MonotonicityNot monotonic
2025-11-24T23:50:13.201546image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=39)
ValueCountFrequency (%)
4016
 
7.4%
3915
 
6.9%
4613
 
6.0%
3413
 
6.0%
3713
 
6.0%
3511
 
5.1%
4211
 
5.1%
3810
 
4.6%
449
 
4.1%
497
 
3.2%
Other values (29)99
45.6%
ValueCountFrequency (%)
201
 
0.5%
251
 
0.5%
262
 
0.9%
271
 
0.5%
282
 
0.9%
291
 
0.5%
307
3.2%
316
2.8%
321
 
0.5%
335
2.3%
ValueCountFrequency (%)
671
 
0.5%
631
 
0.5%
621
 
0.5%
611
 
0.5%
582
 
0.9%
575
2.3%
561
 
0.5%
554
1.8%
547
3.2%
534
1.8%

Sex
Categorical

High correlation 

Distinct2
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size13.2 KiB
Female
153 
Male
64 

Length

Max length6
Median length6
Mean length5.4101382
Min length4

Characters and Unicode

Total characters1174
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFemale
2nd rowFemale
3rd rowFemale
4th rowFemale
5th rowMale

Common Values

ValueCountFrequency (%)
Female153
70.5%
Male64
29.5%

Length

2025-11-24T23:50:13.247297image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:50:13.286809image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
female153
70.5%
male64
29.5%

Most occurring characters

ValueCountFrequency (%)
e370
31.5%
a217
18.5%
l217
18.5%
F153
13.0%
m153
13.0%
M64
 
5.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter957
81.5%
Uppercase Letter217
 
18.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e370
38.7%
a217
22.7%
l217
22.7%
m153
16.0%
Uppercase Letter
ValueCountFrequency (%)
F153
70.5%
M64
29.5%

Most occurring scripts

ValueCountFrequency (%)
Latin1174
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e370
31.5%
a217
18.5%
l217
18.5%
F153
13.0%
m153
13.0%
M64
 
5.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII1174
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e370
31.5%
a217
18.5%
l217
18.5%
F153
13.0%
m153
13.0%
M64
 
5.5%

Race
Categorical

High correlation  Imbalance 

Distinct2
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size13.1 KiB
Black
216 
Mixed Race
 
1

Length

Max length10
Median length5
Mean length5.0230415
Min length5

Characters and Unicode

Total characters1090
Distinct characters12
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.5%

Sample

1st rowBlack
2nd rowBlack
3rd rowBlack
4th rowBlack
5th rowBlack

Common Values

ValueCountFrequency (%)
Black216
99.5%
Mixed Race1
 
0.5%

Length

2025-11-24T23:50:13.324379image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:50:13.358145image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
black216
99.1%
mixed1
 
0.5%
race1
 
0.5%

Most occurring characters

ValueCountFrequency (%)
a217
19.9%
c217
19.9%
B216
19.8%
l216
19.8%
k216
19.8%
e2
 
0.2%
M1
 
0.1%
i1
 
0.1%
x1
 
0.1%
d1
 
0.1%
Other values (2)2
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter871
79.9%
Uppercase Letter218
 
20.0%
Space Separator1
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a217
24.9%
c217
24.9%
l216
24.8%
k216
24.8%
e2
 
0.2%
i1
 
0.1%
x1
 
0.1%
d1
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
B216
99.1%
M1
 
0.5%
R1
 
0.5%
Space Separator
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1089
99.9%
Common1
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a217
19.9%
c217
19.9%
B216
19.8%
l216
19.8%
k216
19.8%
e2
 
0.2%
M1
 
0.1%
i1
 
0.1%
x1
 
0.1%
d1
 
0.1%
Common
ValueCountFrequency (%)
1
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1090
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a217
19.9%
c217
19.9%
B216
19.8%
l216
19.8%
k216
19.8%
e2
 
0.2%
M1
 
0.1%
i1
 
0.1%
x1
 
0.1%
d1
 
0.1%
Other values (2)2
 
0.2%

latitude
Categorical

High correlation 

Distinct2
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size13.8 KiB
-26.2041
190 
-26.2309
27 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters1736
Distinct characters9
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row-26.2041
2nd row-26.2041
3rd row-26.2041
4th row-26.2041
5th row-26.2309

Common Values

ValueCountFrequency (%)
-26.2041190
87.6%
-26.230927
 
12.4%

Length

2025-11-24T23:50:13.484840image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:50:13.522344image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
26.2041190
87.6%
26.230927
 
12.4%

Most occurring characters

ValueCountFrequency (%)
2434
25.0%
-217
12.5%
6217
12.5%
.217
12.5%
0217
12.5%
4190
10.9%
1190
10.9%
327
 
1.6%
927
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1302
75.0%
Dash Punctuation217
 
12.5%
Other Punctuation217
 
12.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2434
33.3%
6217
16.7%
0217
16.7%
4190
14.6%
1190
14.6%
327
 
2.1%
927
 
2.1%
Dash Punctuation
ValueCountFrequency (%)
-217
100.0%
Other Punctuation
ValueCountFrequency (%)
.217
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1736
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2434
25.0%
-217
12.5%
6217
12.5%
.217
12.5%
0217
12.5%
4190
10.9%
1190
10.9%
327
 
1.6%
927
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII1736
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2434
25.0%
-217
12.5%
6217
12.5%
.217
12.5%
0217
12.5%
4190
10.9%
1190
10.9%
327
 
1.6%
927
 
1.6%

longitude
Categorical

High correlation 

Distinct3
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size13.6 KiB
28.0473
172 
27.8585
27 
27.9394
18 

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters1519
Distinct characters9
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row28.0473
2nd row28.0473
3rd row28.0473
4th row27.9394
5th row27.8585

Common Values

ValueCountFrequency (%)
28.0473172
79.3%
27.858527
 
12.4%
27.939418
 
8.3%

Length

2025-11-24T23:50:13.561184image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:50:13.598456image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
28.0473172
79.3%
27.858527
 
12.4%
27.939418
 
8.3%

Most occurring characters

ValueCountFrequency (%)
8226
14.9%
2217
14.3%
.217
14.3%
7217
14.3%
4190
12.5%
3190
12.5%
0172
11.3%
554
 
3.6%
936
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1302
85.7%
Other Punctuation217
 
14.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
8226
17.4%
2217
16.7%
7217
16.7%
4190
14.6%
3190
14.6%
0172
13.2%
554
 
4.1%
936
 
2.8%
Other Punctuation
ValueCountFrequency (%)
.217
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1519
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
8226
14.9%
2217
14.3%
.217
14.3%
7217
14.3%
4190
12.5%
3190
12.5%
0172
11.3%
554
 
3.6%
936
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII1519
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
8226
14.9%
2217
14.3%
.217
14.3%
7217
14.3%
4190
12.5%
3190
12.5%
0172
11.3%
554
 
3.6%
936
 
2.4%

province
Categorical

Constant 

Distinct1
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size13.6 KiB
Gauteng
217 

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters1519
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGauteng
2nd rowGauteng
3rd rowGauteng
4th rowGauteng
5th rowGauteng

Common Values

ValueCountFrequency (%)
Gauteng217
100.0%

Length

2025-11-24T23:50:13.639307image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:50:13.673371image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
gauteng217
100.0%

Most occurring characters

ValueCountFrequency (%)
G217
14.3%
a217
14.3%
u217
14.3%
t217
14.3%
e217
14.3%
n217
14.3%
g217
14.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1302
85.7%
Uppercase Letter217
 
14.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a217
16.7%
u217
16.7%
t217
16.7%
e217
16.7%
n217
16.7%
g217
16.7%
Uppercase Letter
ValueCountFrequency (%)
G217
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1519
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
G217
14.3%
a217
14.3%
u217
14.3%
t217
14.3%
e217
14.3%
n217
14.3%
g217
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII1519
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
G217
14.3%
a217
14.3%
u217
14.3%
t217
14.3%
e217
14.3%
n217
14.3%
g217
14.3%

city
Categorical

Constant 

Distinct1
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size14.6 KiB
Johannesburg
217 

Length

Max length12
Median length12
Mean length12
Min length12

Characters and Unicode

Total characters2604
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJohannesburg
2nd rowJohannesburg
3rd rowJohannesburg
4th rowJohannesburg
5th rowJohannesburg

Common Values

ValueCountFrequency (%)
Johannesburg217
100.0%

Length

2025-11-24T23:50:13.707653image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:50:13.739734image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
johannesburg217
100.0%

Most occurring characters

ValueCountFrequency (%)
n434
16.7%
J217
8.3%
o217
8.3%
h217
8.3%
a217
8.3%
e217
8.3%
s217
8.3%
b217
8.3%
u217
8.3%
r217
8.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2387
91.7%
Uppercase Letter217
 
8.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n434
18.2%
o217
9.1%
h217
9.1%
a217
9.1%
e217
9.1%
s217
9.1%
b217
9.1%
u217
9.1%
r217
9.1%
g217
9.1%
Uppercase Letter
ValueCountFrequency (%)
J217
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2604
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
n434
16.7%
J217
8.3%
o217
8.3%
h217
8.3%
a217
8.3%
e217
8.3%
s217
8.3%
b217
8.3%
u217
8.3%
r217
8.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII2604
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n434
16.7%
J217
8.3%
o217
8.3%
h217
8.3%
a217
8.3%
e217
8.3%
s217
8.3%
b217
8.3%
u217
8.3%
r217
8.3%

jhb_subregion
Categorical

High correlation 

Distinct2
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size14.4 KiB
Central_JHB
172 
Western_JHB
45 

Length

Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

Total characters2387
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCentral_JHB
2nd rowCentral_JHB
3rd rowCentral_JHB
4th rowWestern_JHB
5th rowWestern_JHB

Common Values

ValueCountFrequency (%)
Central_JHB172
79.3%
Western_JHB45
 
20.7%

Length

2025-11-24T23:50:13.773328image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:50:13.808605image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
central_jhb172
79.3%
western_jhb45
 
20.7%

Most occurring characters

ValueCountFrequency (%)
e262
11.0%
n217
9.1%
t217
9.1%
r217
9.1%
_217
9.1%
J217
9.1%
H217
9.1%
B217
9.1%
C172
7.2%
a172
7.2%
Other values (3)262
11.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1302
54.5%
Uppercase Letter868
36.4%
Connector Punctuation217
 
9.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e262
20.1%
n217
16.7%
t217
16.7%
r217
16.7%
a172
13.2%
l172
13.2%
s45
 
3.5%
Uppercase Letter
ValueCountFrequency (%)
J217
25.0%
H217
25.0%
B217
25.0%
C172
19.8%
W45
 
5.2%
Connector Punctuation
ValueCountFrequency (%)
_217
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2170
90.9%
Common217
 
9.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e262
12.1%
n217
10.0%
t217
10.0%
r217
10.0%
J217
10.0%
H217
10.0%
B217
10.0%
C172
7.9%
a172
7.9%
l172
7.9%
Other values (2)90
 
4.1%
Common
ValueCountFrequency (%)
_217
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2387
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e262
11.0%
n217
9.1%
t217
9.1%
r217
9.1%
_217
9.1%
J217
9.1%
H217
9.1%
B217
9.1%
C172
7.2%
a172
7.2%
Other values (3)262
11.0%

CD4 cell count (cells/µL)
Real number (ℝ)

High correlation  Missing 

CD4+ T lymphocyte count (missing codes removed)

Distinct194
Distinct (%)91.1%
Missing4
Missing (%)1.8%
Infinite0
Infinite (%)0.0%
Mean669.23944
Minimum90
Maximum1596
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.4 KiB
2025-11-24T23:50:13.847130image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum90
5-th percentile210.4
Q1496
median637
Q3885
95-th percentile1136.8
Maximum1596
Range1506
Interquartile range (IQR)389

Descriptive statistics

Standard deviation278.34576
Coefficient of variation (CV)0.41591357
Kurtosis0.1796013
Mean669.23944
Median Absolute Deviation (MAD)184
Skewness0.39364919
Sum142548
Variance77476.362
MonotonicityNot monotonic
2025-11-24T23:50:13.896575image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8943
 
1.4%
5963
 
1.4%
9473
 
1.4%
6202
 
0.9%
4042
 
0.9%
5672
 
0.9%
8472
 
0.9%
6262
 
0.9%
5942
 
0.9%
6512
 
0.9%
Other values (184)190
87.6%
(Missing)4
 
1.8%
ValueCountFrequency (%)
901
0.5%
1211
0.5%
1281
0.5%
1381
0.5%
1541
0.5%
1601
0.5%
1651
0.5%
1771
0.5%
1781
0.5%
1901
0.5%
ValueCountFrequency (%)
15961
0.5%
15011
0.5%
13711
0.5%
13391
0.5%
12541
0.5%
12391
0.5%
12301
0.5%
11871
0.5%
11841
0.5%
11781
0.5%

HIV viral load (copies/mL)
Categorical

High correlation  Imbalance 

HIV RNA copies per mL (missing codes removed)

Distinct4
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Memory size12.8 KiB
0.0
176 
40.0
39 
41.0
 
1
63.0
 
1

Length

Max length4
Median length3
Mean length3.1889401
Min length3

Characters and Unicode

Total characters692
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.9%

Sample

1st row0.0
2nd row0.0
3rd row40.0
4th row0.0
5th row40.0

Common Values

ValueCountFrequency (%)
0.0176
81.1%
40.039
 
18.0%
41.01
 
0.5%
63.01
 
0.5%

Length

2025-11-24T23:50:13.943946image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:50:13.981339image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0176
81.1%
40.039
 
18.0%
41.01
 
0.5%
63.01
 
0.5%

Most occurring characters

ValueCountFrequency (%)
0432
62.4%
.217
31.4%
440
 
5.8%
11
 
0.1%
61
 
0.1%
31
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number475
68.6%
Other Punctuation217
31.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0432
90.9%
440
 
8.4%
11
 
0.2%
61
 
0.2%
31
 
0.2%
Other Punctuation
ValueCountFrequency (%)
.217
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common692
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0432
62.4%
.217
31.4%
440
 
5.8%
11
 
0.1%
61
 
0.1%
31
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII692
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0432
62.4%
.217
31.4%
440
 
5.8%
11
 
0.1%
61
 
0.1%
31
 
0.1%

HIV_status
Categorical

Constant 

Distinct1
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size13.8 KiB
Positive
217 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters1736
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPositive
2nd rowPositive
3rd rowPositive
4th rowPositive
5th rowPositive

Common Values

ValueCountFrequency (%)
Positive217
100.0%

Length

2025-11-24T23:50:14.024428image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:50:14.059319image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
positive217
100.0%

Most occurring characters

ValueCountFrequency (%)
i434
25.0%
P217
12.5%
o217
12.5%
s217
12.5%
t217
12.5%
v217
12.5%
e217
12.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1519
87.5%
Uppercase Letter217
 
12.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i434
28.6%
o217
14.3%
s217
14.3%
t217
14.3%
v217
14.3%
e217
14.3%
Uppercase Letter
ValueCountFrequency (%)
P217
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1736
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i434
25.0%
P217
12.5%
o217
12.5%
s217
12.5%
t217
12.5%
v217
12.5%
e217
12.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII1736
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i434
25.0%
P217
12.5%
o217
12.5%
s217
12.5%
t217
12.5%
v217
12.5%
e217
12.5%

Antiretroviral Therapy Status
Categorical

Constant 

Distinct1
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size13.8 KiB
Positive
217 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters1736
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPositive
2nd rowPositive
3rd rowPositive
4th rowPositive
5th rowPositive

Common Values

ValueCountFrequency (%)
Positive217
100.0%

Length

2025-11-24T23:50:14.098248image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:50:14.134574image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
positive217
100.0%

Most occurring characters

ValueCountFrequency (%)
i434
25.0%
P217
12.5%
o217
12.5%
s217
12.5%
t217
12.5%
v217
12.5%
e217
12.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1519
87.5%
Uppercase Letter217
 
12.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i434
28.6%
o217
14.3%
s217
14.3%
t217
14.3%
v217
14.3%
e217
14.3%
Uppercase Letter
ValueCountFrequency (%)
P217
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1736
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i434
25.0%
P217
12.5%
o217
12.5%
s217
12.5%
t217
12.5%
v217
12.5%
e217
12.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII1736
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i434
25.0%
P217
12.5%
o217
12.5%
s217
12.5%
t217
12.5%
v217
12.5%
e217
12.5%

White blood cell count (×10³/µL)
Real number (ℝ)

High correlation 

Distinct170
Distinct (%)78.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.4964055
Minimum2.25
Maximum15.85
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.4 KiB
2025-11-24T23:50:14.174237image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum2.25
5-th percentile3.358
Q14.36
median5.21
Q36.5
95-th percentile8.102
Maximum15.85
Range13.6
Interquartile range (IQR)2.14

Descriptive statistics

Standard deviation1.7174402
Coefficient of variation (CV)0.31246607
Kurtosis8.8783306
Mean5.4964055
Median Absolute Deviation (MAD)0.99
Skewness1.8992561
Sum1192.72
Variance2.9496009
MonotonicityNot monotonic
2025-11-24T23:50:14.224714image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4.363
 
1.4%
4.453
 
1.4%
5.743
 
1.4%
5.983
 
1.4%
4.223
 
1.4%
5.212
 
0.9%
4.472
 
0.9%
4.152
 
0.9%
4.22
 
0.9%
7.482
 
0.9%
Other values (160)192
88.5%
ValueCountFrequency (%)
2.251
0.5%
2.281
0.5%
2.41
0.5%
2.481
0.5%
2.51
0.5%
2.971
0.5%
3.151
0.5%
3.171
0.5%
3.211
0.5%
3.251
0.5%
ValueCountFrequency (%)
15.851
0.5%
14.981
0.5%
8.971
0.5%
8.911
0.5%
8.641
0.5%
8.561
0.5%
8.472
0.9%
8.31
0.5%
8.211
0.5%
8.151
0.5%

Platelet count (×10³/µL)
Real number (ℝ)

Platelet count (missing codes removed)

Distinct145
Distinct (%)66.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean264.53456
Minimum110
Maximum588
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.4 KiB
2025-11-24T23:50:14.274599image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum110
5-th percentile171.8
Q1219
median251
Q3306
95-th percentile385.4
Maximum588
Range478
Interquartile range (IQR)87

Descriptive statistics

Standard deviation71.369474
Coefficient of variation (CV)0.26979262
Kurtosis2.7657312
Mean264.53456
Median Absolute Deviation (MAD)44
Skewness1.1693336
Sum57404
Variance5093.6018
MonotonicityNot monotonic
2025-11-24T23:50:14.324707image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2074
 
1.8%
2344
 
1.8%
2304
 
1.8%
2644
 
1.8%
2454
 
1.8%
1643
 
1.4%
2053
 
1.4%
2373
 
1.4%
2353
 
1.4%
2193
 
1.4%
Other values (135)182
83.9%
ValueCountFrequency (%)
1101
 
0.5%
1341
 
0.5%
1411
 
0.5%
1461
 
0.5%
1481
 
0.5%
1631
 
0.5%
1643
1.4%
1651
 
0.5%
1711
 
0.5%
1721
 
0.5%
ValueCountFrequency (%)
5881
0.5%
5272
0.9%
4771
0.5%
4601
0.5%
4471
0.5%
4221
0.5%
4031
0.5%
3991
0.5%
3901
0.5%
3871
0.5%

hemoglobin_g_dL
Real number (ℝ)

High correlation 

Distinct68
Distinct (%)31.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.53871
Minimum7.6
Maximum17.7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.4 KiB
2025-11-24T23:50:14.373741image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum7.6
5-th percentile10.5
Q112.5
median13.5
Q314.7
95-th percentile16.3
Maximum17.7
Range10.1
Interquartile range (IQR)2.2

Descriptive statistics

Standard deviation1.753429
Coefficient of variation (CV)0.12951227
Kurtosis0.35834819
Mean13.53871
Median Absolute Deviation (MAD)1.1
Skewness-0.28913593
Sum2937.9
Variance3.0745131
MonotonicityNot monotonic
2025-11-24T23:50:14.423627image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1410
 
4.6%
14.19
 
4.1%
12.57
 
3.2%
12.97
 
3.2%
137
 
3.2%
13.37
 
3.2%
13.86
 
2.8%
15.56
 
2.8%
14.66
 
2.8%
13.26
 
2.8%
Other values (58)146
67.3%
ValueCountFrequency (%)
7.61
 
0.5%
8.61
 
0.5%
91
 
0.5%
9.31
 
0.5%
9.71
 
0.5%
9.91
 
0.5%
10.13
1.4%
10.41
 
0.5%
10.53
1.4%
10.61
 
0.5%
ValueCountFrequency (%)
17.71
 
0.5%
17.61
 
0.5%
17.41
 
0.5%
17.31
 
0.5%
16.93
1.4%
16.51
 
0.5%
16.41
 
0.5%
16.35
2.3%
161
 
0.5%
15.93
1.4%

ALT (U/L)
Real number (ℝ)

High correlation 

Alanine aminotransferase (missing codes removed)

Distinct42
Distinct (%)19.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20.926267
Minimum6
Maximum98
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.4 KiB
2025-11-24T23:50:14.470584image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum6
5-th percentile10
Q114
median17
Q324
95-th percentile41
Maximum98
Range92
Interquartile range (IQR)10

Descriptive statistics

Standard deviation13.499798
Coefficient of variation (CV)0.64511255
Kurtosis14.618813
Mean20.926267
Median Absolute Deviation (MAD)5
Skewness3.3227683
Sum4541
Variance182.24454
MonotonicityNot monotonic
2025-11-24T23:50:14.514582image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=42)
ValueCountFrequency (%)
1517
 
7.8%
1416
 
7.4%
1715
 
6.9%
1214
 
6.5%
1613
 
6.0%
1011
 
5.1%
2111
 
5.1%
1310
 
4.6%
189
 
4.1%
208
 
3.7%
Other values (32)93
42.9%
ValueCountFrequency (%)
62
 
0.9%
71
 
0.5%
82
 
0.9%
94
 
1.8%
1011
5.1%
118
3.7%
1214
6.5%
1310
4.6%
1416
7.4%
1517
7.8%
ValueCountFrequency (%)
982
0.9%
971
0.5%
711
0.5%
701
0.5%
641
0.5%
501
0.5%
462
0.9%
431
0.5%
412
0.9%
402
0.9%

AST (U/L)
Real number (ℝ)

High correlation 

Distinct34
Distinct (%)15.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22.705069
Minimum10
Maximum97
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.4 KiB
2025-11-24T23:50:14.556267image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum10
5-th percentile14.8
Q118
median21
Q325
95-th percentile33.2
Maximum97
Range87
Interquartile range (IQR)7

Descriptive statistics

Standard deviation8.3258645
Coefficient of variation (CV)0.36669629
Kurtosis30.009058
Mean22.705069
Median Absolute Deviation (MAD)4
Skewness4.0488585
Sum4927
Variance69.32002
MonotonicityNot monotonic
2025-11-24T23:50:14.597425image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=34)
ValueCountFrequency (%)
2219
 
8.8%
2018
 
8.3%
2118
 
8.3%
1717
 
7.8%
2314
 
6.5%
1913
 
6.0%
1513
 
6.0%
1813
 
6.0%
2511
 
5.1%
2411
 
5.1%
Other values (24)70
32.3%
ValueCountFrequency (%)
101
 
0.5%
123
 
1.4%
131
 
0.5%
146
 
2.8%
1513
6.0%
168
3.7%
1717
7.8%
1813
6.0%
1913
6.0%
2018
8.3%
ValueCountFrequency (%)
971
0.5%
551
0.5%
501
0.5%
481
0.5%
431
0.5%
411
0.5%
401
0.5%
391
0.5%
381
0.5%
351
0.5%

total_cholesterol_mg_dL
Real number (ℝ)

Distinct155
Distinct (%)71.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.9320276
Minimum2.82
Maximum8.18
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.4 KiB
2025-11-24T23:50:14.641489image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum2.82
5-th percentile3.43
Q14.28
median4.74
Q35.53
95-th percentile6.698
Maximum8.18
Range5.36
Interquartile range (IQR)1.25

Descriptive statistics

Standard deviation0.962825
Coefficient of variation (CV)0.1952189
Kurtosis-0.033836679
Mean4.9320276
Median Absolute Deviation (MAD)0.61
Skewness0.41501086
Sum1070.25
Variance0.92703198
MonotonicityNot monotonic
2025-11-24T23:50:14.691732image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5.584
 
1.8%
3.743
 
1.4%
4.793
 
1.4%
4.073
 
1.4%
4.963
 
1.4%
6.043
 
1.4%
4.383
 
1.4%
4.183
 
1.4%
3.693
 
1.4%
5.323
 
1.4%
Other values (145)186
85.7%
ValueCountFrequency (%)
2.821
 
0.5%
2.851
 
0.5%
2.891
 
0.5%
3.081
 
0.5%
3.191
 
0.5%
3.341
 
0.5%
3.392
0.9%
3.41
 
0.5%
3.433
1.4%
3.571
 
0.5%
ValueCountFrequency (%)
8.181
0.5%
7.341
0.5%
7.092
0.9%
6.971
0.5%
6.821
0.5%
6.811
0.5%
6.791
0.5%
6.781
0.5%
6.732
0.9%
6.691
0.5%

cd4_correction_applied
Categorical

Constant 

Quality flag: CD4 missing codes removed

Distinct1
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size12.7 KiB
0.0
217 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters651
Distinct characters2
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0217
100.0%

Length

2025-11-24T23:50:14.738683image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:50:14.774044image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0217
100.0%

Most occurring characters

ValueCountFrequency (%)
0434
66.7%
.217
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number434
66.7%
Other Punctuation217
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0434
100.0%
Other Punctuation
ValueCountFrequency (%)
.217
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common651
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0434
66.7%
.217
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII651
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0434
66.7%
.217
33.3%

final_comprehensive_fix_applied
Categorical

Constant 

Quality flag: Comprehensive corrections applied

Distinct1
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size12.7 KiB
1.0
217 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters651
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0217
100.0%

Length

2025-11-24T23:50:14.810053image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-11-24T23:50:14.846018image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
1.0217
100.0%

Most occurring characters

ValueCountFrequency (%)
1217
33.3%
.217
33.3%
0217
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number434
66.7%
Other Punctuation217
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1217
50.0%
0217
50.0%
Other Punctuation
ValueCountFrequency (%)
.217
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common651
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1217
33.3%
.217
33.3%
0217
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII651
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1217
33.3%
.217
33.3%
0217
33.3%

waist_circ_unit_correction_applied
Boolean

Constant 

Quality flag: Waist circ unit corrected

Distinct1
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size1.9 KiB
False
217 
ValueCountFrequency (%)
False217
100.0%
2025-11-24T23:50:14.876810image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Interactions

2025-11-24T23:50:11.998811image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:06.233010image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:08.172979image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:08.755743image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:09.277505image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:09.796510image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:10.406852image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:10.916678image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:11.418780image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:12.274913image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:06.710941image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:08.526036image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:09.024430image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:09.549918image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:10.168399image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:10.683091image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:11.191703image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:11.771923image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:12.302824image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:06.929900image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:08.553345image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:09.054347image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:09.579314image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:10.196357image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:10.710682image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:11.220075image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:11.798458image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:12.334843image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:07.101037image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:08.583700image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:09.088472image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:09.612188image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:10.228464image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:10.743033image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:11.251106image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:11.828859image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:12.368111image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:07.268431image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:08.615302image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:09.122888image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:09.644030image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:10.260432image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:10.773622image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:11.280800image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:11.860705image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:12.398001image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:07.515098image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:08.644605image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:09.154800image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:09.675743image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:10.289777image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:10.804962image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:11.309446image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:11.888519image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:12.427555image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:07.679019image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:08.673602image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:09.186469image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:09.706299image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:10.320213image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:10.832789image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:11.336850image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:11.916849image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:12.453969image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:07.841030image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:08.699918image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:09.215408image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:09.735062image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:10.348148image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:10.860744image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:11.363799image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:11.942945image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:12.481268image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:08.006533image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:08.726933image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:09.246513image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:09.763821image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:10.376854image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:10.887637image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:11.390739image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2025-11-24T23:50:11.969546image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Correlations

2025-11-24T23:50:14.904773image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ALT (U/L)AST (U/L)Age (at enrolment)CD4 cell count (cells/µL)HIV viral load (copies/mL)Patient IDPlatelet count (×10³/µL)RaceSexWhite blood cell count (×10³/µL)hemoglobin_g_dLjhb_subregionlatitudelongitudetotal_cholesterol_mg_dL
ALT (U/L)1.0000.737-0.0260.1220.000-0.078-0.0470.0000.0000.0250.3160.0000.0000.093-0.027
AST (U/L)0.7371.000-0.0110.0860.000-0.097-0.0600.0000.236-0.1040.1690.0270.1590.068-0.065
Age (at enrolment)-0.026-0.0111.0000.0250.1610.019-0.0550.0000.142-0.0470.0510.0000.0000.0000.198
CD4 cell count (cells/µL)0.1220.0860.0251.0000.000-0.0960.2510.0000.2270.617-0.0400.0000.0000.000-0.014
HIV viral load (copies/mL)0.0000.0000.1610.0001.0001.0000.2930.0000.0000.0550.1170.0000.0000.0000.000
Patient ID-0.078-0.0970.019-0.0961.0001.000-0.0881.0001.0000.061-0.0301.0001.0001.000-0.001
Platelet count (×10³/µL)-0.047-0.060-0.0550.2510.293-0.0881.0000.0000.3460.369-0.3400.0000.0000.0000.119
Race0.0000.0000.0000.0000.0001.0000.0001.0000.0000.0000.0000.0000.0000.0000.000
Sex0.0000.2360.1420.2270.0001.0000.3460.0001.0000.2610.5950.0000.0000.0000.012
White blood cell count (×10³/µL)0.025-0.104-0.0470.6170.0550.0610.3690.0000.2611.0000.0180.0000.0000.0000.018
hemoglobin_g_dL0.3160.1690.051-0.0400.117-0.030-0.3400.0000.5950.0181.0000.0480.0740.0000.150
jhb_subregion0.0000.0270.0000.0000.0001.0000.0000.0000.0000.0000.0481.0000.7180.9980.205
latitude0.0000.1590.0000.0000.0001.0000.0000.0000.0000.0000.0740.7181.0000.9980.133
longitude0.0930.0680.0000.0000.0001.0000.0000.0000.0000.0000.0000.9980.9981.0000.123
total_cholesterol_mg_dL-0.027-0.0650.198-0.0140.000-0.0010.1190.0000.0120.0180.1500.2050.1330.1231.000

Missing values

2025-11-24T23:50:12.533829image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
A simple visualization of nullity by column.
2025-11-24T23:50:12.647785image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

anonymous_patient_idPatient IDstudy_sourceprimary_dateAge (at enrolment)SexRacelatitudelongitudeprovincecityjhb_subregionCD4 cell count (cells/µL)HIV viral load (copies/mL)HIV_statusAntiretroviral Therapy StatusWhite blood cell count (×10³/µL)Platelet count (×10³/µL)hemoglobin_g_dLALT (U/L)AST (U/L)total_cholesterol_mg_dLcd4_correction_appliedfinal_comprehensive_fix_appliedwaist_circ_unit_correction_applied
0HEAT_329E55DDD2781JHB_WRHI_0032016-07-1930.0FemaleBlack-26.204128.0473GautengJohannesburgCentral_JHB1020.00.0PositivePositive5.21390.010.916.025.06.790.01.0False
1HEAT_C8A77DD97D983JHB_WRHI_0032016-07-1953.0FemaleBlack-26.204128.0473GautengJohannesburgCentral_JHB446.00.0PositivePositive3.68234.013.58.015.04.930.01.0False
2HEAT_A4407F8E079E4JHB_WRHI_0032016-07-1936.0FemaleBlack-26.204128.0473GautengJohannesburgCentral_JHB1054.040.0PositivePositive7.71344.013.317.017.05.190.01.0False
3HEAT_7DCC7C7F16415JHB_WRHI_0032016-07-1947.0FemaleBlack-26.204127.9394GautengJohannesburgWestern_JHB989.00.0PositivePositive6.35257.010.811.012.06.690.01.0False
4HEAT_32253618AEF86JHB_WRHI_0032016-07-1934.0MaleBlack-26.230927.8585GautengJohannesburgWestern_JHB160.040.0PositivePositive4.17343.011.241.032.03.190.01.0False
5HEAT_5C22FD95BF097JHB_WRHI_0032016-07-2140.0FemaleBlack-26.204128.0473GautengJohannesburgCentral_JHB989.040.0PositivePositive7.09319.015.114.017.05.580.01.0False
6HEAT_A5DC9507FDDA8JHB_WRHI_0032016-07-1935.0MaleBlack-26.204128.0473GautengJohannesburgCentral_JHB453.00.0PositivePositive4.66229.017.430.024.05.520.01.0False
7HEAT_605625B419C89JHB_WRHI_0032016-07-2130.0MaleBlack-26.204128.0473GautengJohannesburgCentral_JHB288.00.0PositivePositive4.12240.017.764.043.04.560.01.0False
8HEAT_931A30042DAF10JHB_WRHI_0032016-07-2244.0FemaleBlack-26.230927.8585GautengJohannesburgWestern_JHB907.00.0PositivePositive4.77230.012.325.031.04.120.01.0False
9HEAT_3E8A9CB371EE12JHB_WRHI_0032016-07-2536.0MaleBlack-26.204128.0473GautengJohannesburgCentral_JHB509.040.0PositivePositive4.72186.015.528.026.04.260.01.0False
anonymous_patient_idPatient IDstudy_sourceprimary_dateAge (at enrolment)SexRacelatitudelongitudeprovincecityjhb_subregionCD4 cell count (cells/µL)HIV viral load (copies/mL)HIV_statusAntiretroviral Therapy StatusWhite blood cell count (×10³/µL)Platelet count (×10³/µL)hemoglobin_g_dLALT (U/L)AST (U/L)total_cholesterol_mg_dLcd4_correction_appliedfinal_comprehensive_fix_appliedwaist_circ_unit_correction_applied
207HEAT_1DEA5323412B337JHB_WRHI_0032017-05-1046.0MaleBlack-26.204128.0473GautengJohannesburgCentral_JHB190.040.0PositivePositive4.36225.014.340.030.04.620.01.0False
208HEAT_7F7338A00A97338JHB_WRHI_0032017-05-2233.0MaleBlack-26.204128.0473GautengJohannesburgCentral_JHB596.00.0PositivePositive4.45237.015.597.050.06.040.01.0False
209HEAT_9BE1B22421BC342JHB_WRHI_0032017-05-1843.0FemaleBlack-26.204128.0473GautengJohannesburgCentral_JHB541.040.0PositivePositive4.58403.012.827.020.06.240.01.0False
210HEAT_BE76BD19CBFC345JHB_WRHI_0032017-05-1742.0MaleBlack-26.204128.0473GautengJohannesburgCentral_JHB530.00.0PositivePositive6.31245.017.639.022.06.730.01.0False
211HEAT_8EC86E61B084346JHB_WRHI_0032017-05-1639.0FemaleBlack-26.204128.0473GautengJohannesburgCentral_JHB390.00.0PositivePositive4.29254.011.614.014.04.460.01.0False
212HEAT_43FDEDBF5846347JHB_WRHI_0032017-05-2239.0MaleBlack-26.204128.0473GautengJohannesburgCentral_JHB415.00.0PositivePositive4.42192.014.018.025.04.710.01.0False
213HEAT_751D406DFAF2348JHB_WRHI_0032017-06-0657.0FemaleBlack-26.204128.0473GautengJohannesburgCentral_JHB786.063.0PositivePositive8.15447.011.316.017.06.030.01.0False
214HEAT_5A4AD8FB0DD2349JHB_WRHI_0032017-05-2561.0MaleBlack-26.204128.0473GautengJohannesburgCentral_JHB672.00.0PositivePositive6.04320.016.313.021.04.330.01.0False
215HEAT_345F925E036F350JHB_WRHI_0032017-05-2637.0FemaleBlack-26.204128.0473GautengJohannesburgCentral_JHB520.00.0PositivePositive4.22248.012.016.022.05.840.01.0False
216HEAT_797EB3CC686B351JHB_WRHI_0032017-06-1539.0FemaleBlack-26.204128.0473GautengJohannesburgCentral_JHB888.040.0PositivePositive4.22210.012.946.032.06.250.01.0False